Add a HNSW collector that exits early when nearest neighbor queue saturates #14094

tteofili · 2025-01-02T09:29:29Z

This introduces a HnswKnnCollector interface, extending KnnCollector for HNSW, to make it possible to hook into HNSW execution for optimizations.
It then adds a new collector which uses a saturation-based threshold to dynamically halt HNSW graph exploration, in order to early exit when the exploration of new candidates is unlikely to lead to addition of new neighbors.
The new collector records the number of added neighbors upon exploration of a new candidate (a HNSW node) and it compares it with the number of neighbors added while exploring the previous candidate, when the rate of added neighbors plateaus for a number of consecutive iterations, it stops graph exploration (earlyTerminate returns true).

lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java

tteofili · 2025-01-15T13:42:28Z

this sample graph (from Cohere-768) shows how the collection of nearest neighbors saturates and hence it makes sense to stop visiting the graph "earlier", e.g., when the saturation counter exceeds a given threshold.

lucene/core/src/test/org/apache/lucene/search/HnswQueueSaturationCollectorTest.java

benwtrent · 2025-01-27T14:03:36Z

lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java

+public interface HnswKnnCollector extends KnnCollector {
+
+  /** Indicates exploration of the next HNSW candidate graph node. */
+  void nextCandidate();
+}


I think this kind of collector is OK. But it makes most sense to me to be a delegate collector. An abstract collector to KnnCollector.Delegate.

Then, I also think that the OrdinalTranslatingKnnCollector should inherit directly from HnswKnnCollector always assuming that the passed in collector is a HnswKnnCollector.

Note, the default behavior for HnswKnnCollector#nextCandidate can simply be nothing, allowing for overriding.

This might require a new HnswGraphSearcher#search interface to keep the old collector actions, but it can be simple to add a new one that accepts a HnswKnnCollector and delegate to it with new HnswKnnCollector(KnnCollector delegate).

I adjusted my refactoring for the seeded queries similarly. It seems nicer IMO: #14170

thanks Ben. I'll incorporate your suggestions once #14170 is in.

made HnswKnnCollector a KnnCollector.Decorator in c6dbf7e

tteofili · 2025-02-13T12:24:17Z

updated results (Cohere-768, 200k docs, merge disabled)

baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.946         1.776  200000   100      50       32        100         no    22645    12.45      16070.71            33           593.99        585.938       585.938

candidate

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.948         1.424  200000   100      50       32        100         no    19507    12.49      16014.09            33           593.98        585.938       585.938

tteofili · 2025-02-13T12:25:35Z

reference paper

tteofili · 2025-02-25T11:21:38Z

I've updated this and moved the early termination logic not to kick in by default but to be based on a (wrapping) PatienceKnnVectorQuery.

tteofili · 2025-02-25T11:25:29Z

updated lucene_util benchmarks, with different parameters (Cohere-768, ndoc=200k).

maxconn=32

baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.963         1.262  200000   100      50       32        100         no     7984    61.61       3246.12             3           595.73        585.938       585.938

candidate@default

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.957         1.165  200000   100      50       32        100         no     7171    65.61       3048.36             3           595.73        585.938       585.938
 0.958         1.164  200000   100      50       32        100         no     7214    63.45       3151.99             3           595.68        585.938       585.938

candidate@sat=0.995,patience=maxconn(32)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  
 0.952         1.082  200000   100      50       32        100         no     6623    61.71       3240.76             3           595.71        585.938       585.938
 0.952         1.119  200000   100      50       32        100         no     6682    61.04       3276.65             3           595.70        585.938       585.938

candidate@sat=0.95,patience=fanout(50)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.908         0.781  200000   100      50       32        100         no     4499    61.82       3235.30             3           595.72        585.938       585.938
 0.909         0.779  200000   100      50       32        100         no     4498    62.09       3221.34             3           595.69        585.938       585.938

candidate@sat=0.995,patience=fanout(50)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.960         1.178  200000   100      50       32        100         no     7361    63.65       3142.33             3           595.75        585.938       585.938
 0.960         1.195  200000   100      50       32        100         no     7441    62.18       3216.42             3           595.76        585.938       585.938

maxconn=64

baseline

 0.968         1.351  200000   100      50       64        100         no     8698    63.04       3172.79             3           595.76        585.938       585.938
 0.968         1.328  200000   100      50       64        100         no     8744    62.29       3210.94             3           595.77        585.938       585.938

candidate@default

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.960         1.193  200000   100      50       64        100         no     7751    62.79       3185.27             3           595.76        585.938       585.938
 0.961         1.213  200000   100      50       64        100         no     7789    61.73       3240.02             3           595.73        585.938       585.938

candidate@sat=0.995,patience=maxconn(64)

 0.965         1.282  200000   100      50       64        100         no     8364    62.86       3181.88             3           595.72        585.938       585.938
 0.964         1.274  200000   100      50       64        100         no     8361    62.42       3203.90             3           595.79        585.938       585.938

candidate@sat=0.95,patience=fanout(50)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.913         0.863  200000   100      50       64        100         no     4945    63.00       3174.80             3           595.81        585.938       585.938
 0.916         0.797  200000   100      50       64        100         no     4965    62.70       3189.89             3           595.78        585.938       585.938

candidate@sat=0.995,patience=fanout(50)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.962         1.226  200000   100      50       64        100         no     7991    64.13       3118.52             3           595.82        585.938       585.938
 0.963         1.236  200000   100      50       64        100         no     7859    62.52       3199.23             3           595.80        585.938       585.938

svilen-mihaylov-elastic · 2025-03-12T15:01:07Z

lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java

+    super(collector);
+  }
+
+  /** Indicates exploration of the next HNSW candidate graph node. */


nit: suggest "triggers" instead of "indicates"

svilen-mihaylov-elastic · 2025-03-12T15:03:23Z

lucene/core/src/test/org/apache/lucene/search/TestPatienceFloatVectorQuery.java

+
+    @Override
+    public String toString(String field) {
+      return null;


Perhaps add some name here, maybe in the future it will be easier to debug?

tteofili added 4 commits December 24, 2024 15:06

Add a HNSW early termination based on nn queue saturation

3b30c07

enable optimized collector with 1k+ docs

0b24e79

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

70b6144

tidy

Loading
Loading status checks…

93fb470

mayya-sharipova reviewed Jan 3, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java Show resolved Hide resolved

mayya-sharipova reviewed Jan 4, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java Show resolved Hide resolved

mayya-sharipova reviewed Jan 4, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java Outdated Show resolved Hide resolved

tteofili added 5 commits January 8, 2025 16:32

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

c5aa473

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

7fc49c5

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

aed6fd5

don't trigger exact search when early terminating

b7eb24f

improved javadoc

Loading
Loading status checks…

d143bbb

tteofili added 5 commits January 15, 2025 14:44

improved javadoc

Loading
Loading status checks…

51df9ee

improved javadoc

Loading
Loading status checks…

e55f967

minor fixes, more tests

e3f8db3

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

ec1e686

tidy

Loading
Loading status checks…

a71e936

mayya-sharipova reviewed Jan 15, 2025

View reviewed changes

lucene/core/src/test/org/apache/lucene/search/HnswQueueSaturationCollectorTest.java Outdated Show resolved Hide resolved

tteofili added 8 commits January 16, 2025 12:10

dropped useless assertions

Loading
Loading status checks…

09b0229

changes added

Loading
Loading status checks…

74132f1

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

8d00ae8

changes to 10.2

Loading
Loading status checks…

370f513

more tests

Loading
Loading status checks…

fed77c9

more tests

Loading
Loading status checks…

88d22df

minor fixes

Loading
Loading status checks…

e86ebdc

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

e69730f

benwtrent reviewed Jan 27, 2025

View reviewed changes

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

5b001ee

github-actions bot added the module:core/search label Feb 3, 2025

github-actions bot added module:core/codecs module:core/hnsw labels Feb 3, 2025

tteofili added 3 commits February 3, 2025 14:08

tidy

Loading
Loading status checks…

20a481f

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

1dbaa1a

make hnsw collector a decorator

Loading
Loading status checks…

c6dbf7e

tteofili added 2 commits February 24, 2025 14:36

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

55fdea2

moved the early termination logic into PatienceKnnVectorQuery

Loading
Loading status checks…

460efd9

tteofili marked this pull request as ready for review February 25, 2025 11:19

minor fix

Loading
Loading status checks…

3d2e46b

tteofili added 3 commits February 25, 2025 12:27

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

0f3f047

updated CHANGES to reflect new query, minor fix

Loading
Loading status checks…

eef4f97

reverted unneeded change

Loading
Loading status checks…

acf5866

github-actions bot removed the module:core/codecs label Feb 25, 2025

tteofili added 2 commits February 25, 2025 12:40

tidy

Loading
Loading status checks…

620e985

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

ca0f05d

svilen-mihaylov-elastic reviewed Mar 12, 2025

View reviewed changes

tteofili added 2 commits March 12, 2025 16:11

minor tweaks

Loading
Loading status checks…

f116141

Merge branch 'main' of github.com:apache/lucene into hnsw_qset

Loading
Loading status checks…

45b2031

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a HNSW collector that exits early when nearest neighbor queue saturates #14094

Add a HNSW collector that exits early when nearest neighbor queue saturates #14094

tteofili commented Jan 2, 2025

tteofili commented Jan 15, 2025

benwtrent Jan 27, 2025

benwtrent Jan 27, 2025

tteofili Jan 28, 2025

tteofili Feb 13, 2025

tteofili commented Feb 13, 2025 •

edited

Loading

tteofili commented Feb 13, 2025

tteofili commented Feb 25, 2025

tteofili commented Feb 25, 2025 •

edited

Loading

svilen-mihaylov-elastic Mar 12, 2025

svilen-mihaylov-elastic Mar 12, 2025

Add a HNSW collector that exits early when nearest neighbor queue saturates #14094

Are you sure you want to change the base?

Add a HNSW collector that exits early when nearest neighbor queue saturates #14094

Conversation

tteofili commented Jan 2, 2025

tteofili commented Jan 15, 2025

benwtrent Jan 27, 2025

Choose a reason for hiding this comment

benwtrent Jan 27, 2025

Choose a reason for hiding this comment

tteofili Jan 28, 2025

Choose a reason for hiding this comment

tteofili Feb 13, 2025

Choose a reason for hiding this comment

tteofili commented Feb 13, 2025 • edited Loading

tteofili commented Feb 13, 2025

tteofili commented Feb 25, 2025

tteofili commented Feb 25, 2025 • edited Loading

maxconn=32

maxconn=64

svilen-mihaylov-elastic Mar 12, 2025

Choose a reason for hiding this comment

svilen-mihaylov-elastic Mar 12, 2025

Choose a reason for hiding this comment

tteofili commented Feb 13, 2025 •

edited

Loading

tteofili commented Feb 25, 2025 •

edited

Loading